PISA Data Exploration

by Ken Norton

PISA is a survey of students' skills and knowledge as they approach the end of compulsory education. It is not a conventional school test. Rather than examining how well students have learned the school curriculum, it looks at how well prepared they are for life beyond school.

Around 510,000 students in 65 economies took part in the PISA 2012 assessment of reading, mathematics and science representing about 28 million 15-year-olds globally.

My questions:

  • Does the age when a child begins learning the language the test was given in affect their performance?
  • Do children in freer countries perform differently than those in less free countries?
  • Does economic, social, and cultural status – advantage vs. disadvantage – relate to academic performance?

Preliminary Wrangling

Load Data

In [1]:
# import all packages and set plots to be embedded inline
import numpy as np
import pandas as pd
import matplotlib.cm as cm
import matplotlib.pyplot as plt
import seaborn as sns
import statsmodels.api as sm

%config InlineBackend.figure_format = 'retina'
%matplotlib inline
In [2]:
# Plot styles
plt.style.use('fivethirtyeight')
plt.style.use('seaborn-poster')

# We have a lot of columns in our data set
pd.set_option('display.max_columns', 650)
pd.set_option('display.max_rows', 650)
pd.set_option('display.width', 1000)
In [3]:
# Using low_memory=False here to suppress the dtype errors
# per: https://stackoverflow.com/a/40585000
df = pd.read_csv('data/pisa2012.csv', low_memory=False, encoding='latin1')
In [4]:
# Import the freedom scores
df_free = pd.read_csv('data/Freedom_in_the_World_2012.csv')

Assessment

In [5]:
# High level inspection
df.shape
Out[5]:
(485490, 636)
In [6]:
df.sample(5)
Out[6]:
Unnamed: 0 CNT SUBNATIO STRATUM OECD NC SCHOOLID STIDSTD ST01Q01 ST02Q01 ST03Q01 ST03Q02 ST04Q01 ST05Q01 ST06Q01 ST07Q01 ST07Q02 ST07Q03 ST08Q01 ST09Q01 ST115Q01 ST11Q01 ST11Q02 ST11Q03 ST11Q04 ST11Q05 ST11Q06 ST13Q01 ST14Q01 ST14Q02 ST14Q03 ST14Q04 ST15Q01 ST17Q01 ST18Q01 ST18Q02 ST18Q03 ST18Q04 ST19Q01 ST20Q01 ST20Q02 ST20Q03 ST21Q01 ST25Q01 ST26Q01 ST26Q02 ST26Q03 ST26Q04 ST26Q05 ST26Q06 ST26Q07 ST26Q08 ST26Q09 ST26Q10 ST26Q11 ST26Q12 ST26Q13 ST26Q14 ST26Q15 ST26Q16 ST26Q17 ST27Q01 ST27Q02 ST27Q03 ST27Q04 ST27Q05 ST28Q01 ST29Q01 ST29Q02 ST29Q03 ST29Q04 ST29Q05 ST29Q06 ST29Q07 ST29Q08 ST35Q01 ST35Q02 ST35Q03 ST35Q04 ST35Q05 ST35Q06 ST37Q01 ST37Q02 ST37Q03 ST37Q04 ST37Q05 ST37Q06 ST37Q07 ST37Q08 ST42Q01 ST42Q02 ST42Q03 ST42Q04 ST42Q05 ST42Q06 ST42Q07 ST42Q08 ST42Q09 ST42Q10 ST43Q01 ST43Q02 ST43Q03 ST43Q04 ST43Q05 ST43Q06 ST44Q01 ST44Q03 ST44Q04 ST44Q05 ST44Q07 ST44Q08 ST46Q01 ST46Q02 ST46Q03 ST46Q04 ST46Q05 ST46Q06 ST46Q07 ST46Q08 ST46Q09 ST48Q01 ST48Q02 ST48Q03 ST48Q04 ST48Q05 ST49Q01 ST49Q02 ST49Q03 ST49Q04 ST49Q05 ST49Q06 ST49Q07 ST49Q09 ST53Q01 ST53Q02 ST53Q03 ST53Q04 ST55Q01 ST55Q02 ST55Q03 ST55Q04 ST57Q01 ST57Q02 ST57Q03 ST57Q04 ST57Q05 ST57Q06 ST61Q01 ST61Q02 ST61Q03 ST61Q04 ST61Q05 ST61Q06 ST61Q07 ST61Q08 ST61Q09 ST62Q01 ST62Q02 ST62Q03 ST62Q04 ST62Q06 ST62Q07 ST62Q08 ST62Q09 ST62Q10 ST62Q11 ST62Q12 ST62Q13 ST62Q15 ST62Q16 ST62Q17 ST62Q19 ST69Q01 ST69Q02 ST69Q03 ST70Q01 ST70Q02 ST70Q03 ST71Q01 ST72Q01 ST73Q01 ST73Q02 ST74Q01 ST74Q02 ST75Q01 ST75Q02 ST76Q01 ST76Q02 ST77Q01 ST77Q02 ST77Q04 ST77Q05 ST77Q06 ST79Q01 ST79Q02 ST79Q03 ST79Q04 ST79Q05 ST79Q06 ST79Q07 ST79Q08 ST79Q10 ST79Q11 ST79Q12 ST79Q15 ST79Q17 ST80Q01 ST80Q04 ST80Q05 ST80Q06 ST80Q07 ST80Q08 ST80Q09 ST80Q10 ST80Q11 ST81Q01 ST81Q02 ST81Q03 ST81Q04 ST81Q05 ST82Q01 ST82Q02 ST82Q03 ST83Q01 ST83Q02 ST83Q03 ST83Q04 ST84Q01 ST84Q02 ST84Q03 ST85Q01 ST85Q02 ST85Q03 ST85Q04 ST86Q01 ST86Q02 ST86Q03 ST86Q04 ST86Q05 ST87Q01 ST87Q02 ST87Q03 ST87Q04 ST87Q05 ST87Q06 ST87Q07 ST87Q08 ST87Q09 ST88Q01 ST88Q02 ST88Q03 ST88Q04 ST89Q02 ST89Q03 ST89Q04 ST89Q05 ST91Q01 ST91Q02 ST91Q03 ST91Q04 ST91Q05 ST91Q06 ST93Q01 ST93Q03 ST93Q04 ST93Q06 ST93Q07 ST94Q05 ST94Q06 ST94Q09 ST94Q10 ST94Q14 ST96Q01 ST96Q02 ST96Q03 ST96Q05 ST101Q01 ST101Q02 ST101Q03 ST101Q05 ST104Q01 ST104Q04 ST104Q05 ST104Q06 IC01Q01 IC01Q02 IC01Q03 IC01Q04 IC01Q05 IC01Q06 IC01Q07 IC01Q08 IC01Q09 IC01Q10 IC01Q11 IC02Q01 IC02Q02 IC02Q03 IC02Q04 IC02Q05 IC02Q06 IC02Q07 IC03Q01 IC04Q01 IC05Q01 IC06Q01 IC07Q01 IC08Q01 IC08Q02 IC08Q03 IC08Q04 IC08Q05 IC08Q06 IC08Q07 IC08Q08 IC08Q09 IC08Q11 IC09Q01 IC09Q02 IC09Q03 IC09Q04 IC09Q05 IC09Q06 IC09Q07 IC10Q01 IC10Q02 IC10Q03 IC10Q04 IC10Q05 IC10Q06 IC10Q07 IC10Q08 IC10Q09 IC11Q01 IC11Q02 IC11Q03 IC11Q04 IC11Q05 IC11Q06 IC11Q07 IC22Q01 IC22Q02 IC22Q04 IC22Q06 IC22Q07 IC22Q08 EC01Q01 EC02Q01 EC03Q01 EC03Q02 EC03Q03 EC03Q04 EC03Q05 EC03Q06 EC03Q07 EC03Q08 EC03Q09 EC03Q10 EC04Q01A EC04Q01B EC04Q01C EC04Q02A EC04Q02B EC04Q02C EC04Q03A EC04Q03B EC04Q03C EC04Q04A EC04Q04B EC04Q04C EC04Q05A EC04Q05B EC04Q05C EC04Q06A EC04Q06B EC04Q06C EC05Q01 EC06Q01 EC07Q01 EC07Q02 EC07Q03 EC07Q04 EC07Q05 EC08Q01 EC08Q02 EC08Q03 EC08Q04 EC09Q03 EC10Q01 EC11Q02 EC11Q03 EC12Q01 ST22Q01 ST23Q01 ST23Q02 ST23Q03 ST23Q04 ST23Q05 ST23Q06 ST23Q07 ST23Q08 ST24Q01 ST24Q02 ST24Q03 CLCUSE1 CLCUSE301 CLCUSE302 DEFFORT QUESTID BOOKID EASY AGE GRADE PROGN ANXMAT ATSCHL ATTLNACT BELONG BFMJ2 BMMJ1 CLSMAN COBN_F COBN_M COBN_S COGACT CULTDIST CULTPOS DISCLIMA ENTUSE ESCS EXAPPLM EXPUREM FAILMAT FAMCON FAMCONC FAMSTRUC FISCED HEDRES HERITCUL HISCED HISEI HOMEPOS HOMSCH HOSTCUL ICTATTNEG ICTATTPOS ICTHOME ICTRES ICTSCH IMMIG INFOCAR INFOJOB1 INFOJOB2 INSTMOT INTMAT ISCEDD ISCEDL ISCEDO LANGCOMM LANGN LANGRPPD LMINS MATBEH MATHEFF MATINTFC MATWKETH MISCED MMINS MTSUP OCOD1 OCOD2 OPENPS OUTHOURS PARED PERSEV REPEAT SCMAT SMINS STUDREL SUBNORM TCHBEHFA TCHBEHSO TCHBEHTD TEACHSUP TESTLANG TIMEINT USEMATH USESCH WEALTH ANCATSCHL ANCATTLNACT ANCBELONG ANCCLSMAN ANCCOGACT ANCINSTMOT ANCINTMAT ANCMATWKETH ANCMTSUP ANCSCMAT ANCSTUDREL ANCSUBNORM PV1MATH PV2MATH PV3MATH PV4MATH PV5MATH PV1MACC PV2MACC PV3MACC PV4MACC PV5MACC PV1MACQ PV2MACQ PV3MACQ PV4MACQ PV5MACQ PV1MACS PV2MACS PV3MACS PV4MACS PV5MACS PV1MACU PV2MACU PV3MACU PV4MACU PV5MACU PV1MAPE PV2MAPE PV3MAPE PV4MAPE PV5MAPE PV1MAPF PV2MAPF PV3MAPF PV4MAPF PV5MAPF PV1MAPI PV2MAPI PV3MAPI PV4MAPI PV5MAPI PV1READ PV2READ PV3READ PV4READ PV5READ PV1SCIE PV2SCIE PV3SCIE PV4SCIE PV5SCIE W_FSTUWT W_FSTR1 W_FSTR2 W_FSTR3 W_FSTR4 W_FSTR5 W_FSTR6 W_FSTR7 W_FSTR8 W_FSTR9 W_FSTR10 W_FSTR11 W_FSTR12 W_FSTR13 W_FSTR14 W_FSTR15 W_FSTR16 W_FSTR17 W_FSTR18 W_FSTR19 W_FSTR20 W_FSTR21 W_FSTR22 W_FSTR23 W_FSTR24 W_FSTR25 W_FSTR26 W_FSTR27 W_FSTR28 W_FSTR29 W_FSTR30 W_FSTR31 W_FSTR32 W_FSTR33 W_FSTR34 W_FSTR35 W_FSTR36 W_FSTR37 W_FSTR38 W_FSTR39 W_FSTR40 W_FSTR41 W_FSTR42 W_FSTR43 W_FSTR44 W_FSTR45 W_FSTR46 W_FSTR47 W_FSTR48 W_FSTR49 W_FSTR50 W_FSTR51 W_FSTR52 W_FSTR53 W_FSTR54 W_FSTR55 W_FSTR56 W_FSTR57 W_FSTR58 W_FSTR59 W_FSTR60 W_FSTR61 W_FSTR62 W_FSTR63 W_FSTR64 W_FSTR65 W_FSTR66 W_FSTR67 W_FSTR68 W_FSTR69 W_FSTR70 W_FSTR71 W_FSTR72 W_FSTR73 W_FSTR74 W_FSTR75 W_FSTR76 W_FSTR77 W_FSTR78 W_FSTR79 W_FSTR80 WVARSTRR VAR_UNIT SENWGT_STU VER_STU
30720 30721 Australia 360000 AUS0411 OECD Australia 462 8570 10 1.0 2 1997 Female Yes, for one year or less NaN No, never No, never No, never NaN NaN NaN Yes Yes Yes Yes No No <ISCED level 3A> NaN NaN NaN NaN Other (e.g. home duties, retired) <ISCED level 3A> No Yes NaN NaN Working full-time <for pay> Country of test Other country Other country NaN Language of the test Yes Yes No Yes NaN Yes Yes Yes Yes No NaN Yes No Yes 36001 36001 36002 Three or more Three or more One Two Two 26-100 books Disagree Disagree Disagree Disagree Agree Disagree Disagree Agree Agree Disagree Disagree Agree Disagree Disagree Confident Not very confident Confident Confident Not very confident Confident Not very confident Not very confident NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN Agree Agree Agree Strongly disagree Agree Disagree Likely Slightly likely Slightly likely Likely Likely Slightly likely Agree Disagree Agree Disagree Disagree Agree Agree Disagree Agree NaN NaN NaN NaN NaN Sometimes Sometimes Never or rarely Never or rarely Never or rarely Sometimes Sometimes Never or rarely Most important check memory learning goals Repeat examples I do not attend <out-of-school time lessons> i... I do not attend <out-of-school time lessons> i... I do not attend <out-of-school time lessons> i... I do not attend <out-of-school time lessons> i... 3.0 NaN 0.0 0.0 NaN 0.0 Frequently Rarely Sometimes Sometimes Sometimes Rarely Rarely Rarely Sometimes NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 70.0 70.0 70.0 6.0 6.0 6.0 NaN NaN Sometimes Sometimes Frequently Frequently Rarely Rarely Frequently Frequently NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN Not much like me Somewhat like me Somewhat like me Not much like me Not much like me Somewhat like me Somewhat like me Mostly like me Mostly like me Somewhat like me probably not do this probably do this probably not do this probably do this 2.0 2.0 3.0 2.0 2.0 3.0 2.0 2.0 No Yes, but I don’t use it No Yes, and I use it Yes, but I don’t use it Yes, and I use it No Yes, and I use it Yes, and I use it No No Yes, and I use it No No Yes, and I use it Yes, and I use it No No 6 years old or younger 7-9 years old 2 1 2 Once or twice a month Never or hardly ever Never or hardly ever Once or twice a week Almost every day Once or twice a week Once or twice a week Once or twice a month Never or hardly ever Never or hardly ever Once or twice a month Once or twice a month Once or twice a month Never or hardly ever Never or hardly ever Once or twice a week Never or hardly ever Never or hardly ever Once or twice a month Once or twice a month Never or hardly ever Once or twice a week NaN Never or hardly ever Never or hardly ever NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN A Simple calculator 6 9 3 StQ Form A booklet 8 Standard set of booklets 15.50 0.0 Australia: Lower secondary general academic NaN NaN NaN NaN 26.62 NaN NaN Another country (AUS) Another country (AUS) Australia NaN NaN 1.27 NaN -0.8396 -0.31 0.1222 -1.0298 -0.0760 NaN NaN 2.0 ISCED 5A, 6 -1.74 NaN ISCED 5A, 6 26.62 0.01 -0.2549 NaN NaN NaN -1.0457 -0.80 -0.0836 Second-Generation NaN NaN NaN -0.67 -0.34 A ISCED level 2 General NaN English NaN 420.0 0.2171 -0.77 NaN -0.4017 ISCED 3A, ISCED 4 420.0 NaN Missing Carpenters and joiners -0.1465 3.0 15.0 -0.5316 Did not repeat a <grade> NaN 420.0 NaN -0.7176 NaN NaN NaN NaN English 4.0 NaN 0.2239 0.17 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 514.4181 497.2815 491.8289 529.9969 501.9551 504.2919 558.0386 553.3650 560.3754 578.2910 421.7246 459.8925 464.5661 459.8925 473.1344 477.8081 529.2179 551.8071 548.6914 570.5016 513.6392 541.6809 540.1231 555.7018 544.0178 505.8498 480.1449 512.8603 510.5234 462.2293 541.6809 505.8498 565.0491 543.2388 498.8394 450.5453 456.7768 467.6819 491.0500 453.6610 518.6090 487.6309 532.9066 514.6375 541.6440 464.3230 468.0530 474.5804 500.6900 498.8251 28.3511 14.0822 14.4148 13.3616 13.4343 46.2766 41.7792 13.4343 45.0802 45.1840 13.3616 13.6993 42.8033 41.8966 46.2766 40.9226 14.7751 40.9226 14.4240 44.0442 14.0822 14.0822 14.4148 13.3616 13.4343 46.2766 41.7792 13.4343 45.0802 45.1840 13.3616 13.6993 42.8033 41.8966 46.2766 40.9226 14.7751 40.9226 14.4240 44.0442 14.0822 42.8033 41.8966 45.0802 45.1840 13.1131 14.4240 45.1840 13.3616 13.4343 45.0802 44.0442 14.0822 14.4148 13.1131 14.7751 40.9226 14.7751 41.7792 13.6993 42.8033 42.8033 41.8966 45.0802 45.1840 13.1131 14.4240 45.1840 13.3616 13.4343 45.0802 44.0442 14.0822 14.4148 13.1131 14.7751 40.9226 14.7751 41.7792 13.6993 42.8033 59 2 0.1131 22NOV13
84210 84211 Canada 1240000 CAN0652 OECD Canada 397 9741 10 2.0 7 1996 Female Yes, for more than one year 5.0 No, never No, never No, never Three or four times Three or four times 1.0 Yes Yes Yes No No No <ISCED level 3A> No No No No Working full-time <for pay> <ISCED level 1> No No No No Working full-time <for pay> Country of test Country of test Country of test NaN Language of the test Yes Yes Yes Yes Yes Yes NaN Yes Yes Yes NaN Yes Yes Yes 124001 124001 124001 Three or more Three or more Two Three or more One 26-100 books NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN Agree Disagree Agree Agree Disagree Agree Agree Disagree Agree Disagree NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN relating to known Improve understanding in my sleep Repeat examples 4 or more but less than 6 hours a week 4 or more but less than 6 hours a week 4 or more but less than 6 hours a week 4 or more but less than 6 hours a week 4.0 2.0 0.0 0.0 2.0 0.0 Sometimes Sometimes Never Sometimes Rarely Never Rarely Rarely Rarely Heard of it once or twice Heard of it once or twice Heard of it once or twice Heard of it once or twice Heard of it once or twice Heard of it once or twice Heard of it once or twice Heard of it once or twice Heard of it once or twice Heard of it once or twice Heard of it once or twice Heard of it once or twice Heard of it once or twice Heard of it once or twice Heard of it once or twice Heard of it once or twice 60.0 60.0 60.0 7.0 7.0 7.0 28.0 30.0 Frequently Frequently Frequently Frequently Frequently Frequently Sometimes Sometimes Every Lesson Every Lesson Every Lesson Every Lesson Every Lesson Every Lesson Every Lesson Never or Hardly Ever Every Lesson Some Lessons Every Lesson Every Lesson Every Lesson Never or Hardly Ever Never or Hardly Ever Never or Hardly Ever Most Lessons Never or Hardly Ever Always or almost always Always or almost always Always or almost always Always or almost always Always or almost always Always or almost always Always or almost always Always or almost always Always or almost always Never or Hardly Ever Never or Hardly Ever Never or Hardly Ever Never or Hardly Ever Never or Hardly Ever Strongly disagree Strongly disagree Strongly disagree Strongly disagree Strongly agree Strongly agree Strongly agree Agree Agree Strongly disagree Strongly agree Strongly agree Strongly agree Strongly disagree Strongly agree Strongly agree Strongly agree Strongly agree Strongly agree Strongly disagree Strongly agree Strongly agree Strongly disagree Strongly agree Strongly disagree Strongly agree Strongly agree Strongly agree Strongly disagree Strongly disagree Strongly agree Strongly agree Strongly agree Strongly agree Strongly agree Strongly agree Strongly agree Agree Strongly disagree Strongly disagree Agree Strongly disagree NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 97 97 97 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN No, never No, never NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 <test language> or <other official national la... 0 to 3 years Mostly <test language> Mostly <test language> Mostly <test language> Mostly <test language> Mostly <test language> Mostly <test language> Mostly <test language> Mostly <test language> Mostly <test language> No, never None No, never No, never None NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN A Scientific calculator 10 10 0 StQ Form C booklet 7 Standard set of booklets 15.83 0.0 Canada: Grades 10 - 12 (Quebec: Secondary 4 a... 0.06 2.35 1.2115 2.63 30.59 16.36 2.1989 Canada Canada Canada 3.2019 NaN 1.01 1.85 NaN -0.29 -0.4371 -1.5748 NaN -1.0444 -0.94 2.0 ISCED 1 0.87 NaN ISCED 3A, ISCED 4 30.59 0.98 NaN NaN NaN NaN NaN 0.24 NaN Native NaN NaN NaN NaN NaN M ISCED level 3 Modular 0.0 English 0.0 420.0 NaN NaN NaN NaN ISCED 3A, ISCED 4 420.0 0.2486 Manufacturing labourers not elsewhere classified Pet groomers and animal care workers NaN 8.0 12.0 NaN Did not repeat a <grade> 0.65 420.0 2.16 NaN -1.4597 0.9374 1.5697 1.68 English NaN NaN NaN 0.78 3.0423 2.0009 3.2554 2.9734 3.4603 NaN NaN NaN 0.5831 0.2649 2.8211 NaN 470.7197 461.3725 457.4778 452.8042 461.3725 469.1619 475.3934 487.0774 481.6249 446.5727 514.3402 487.8564 486.2985 471.4987 430.9939 504.2141 515.8981 494.0879 514.3402 410.7415 398.2785 419.3099 399.0575 376.4683 323.5005 460.5936 445.7937 426.3203 470.7197 441.8990 404.5100 427.8782 422.4256 449.6884 427.0992 385.8155 434.8886 391.2681 450.4674 380.3630 475.1603 495.8123 526.7904 480.7204 427.5016 485.3040 488.1015 545.9157 501.1563 474.1141 52.7701 26.3062 78.3108 79.3985 26.6793 26.3087 79.3985 79.4213 78.3182 78.3108 26.6716 79.4213 26.6793 79.4213 26.3087 26.6716 26.3087 26.6716 78.3182 78.3108 26.3062 78.3108 26.3062 26.6716 79.4213 78.3182 26.6716 26.6793 26.3087 26.3062 79.3985 26.6793 79.4213 26.6793 78.3182 79.3985 78.3182 79.3985 26.3087 26.3062 78.3108 78.3108 26.3062 26.6716 79.4213 78.3182 26.6716 26.6793 26.3087 26.3062 79.3985 26.6793 79.4213 26.6793 78.3182 79.3985 78.3182 79.3985 26.3087 26.3062 78.3108 26.3062 78.3108 79.3985 26.6793 26.3087 79.3985 79.4213 78.3182 78.3108 26.6716 79.4213 26.6793 79.4213 26.3087 26.6716 26.3087 26.6716 78.3182 78.3108 26.3062 72 2 0.1516 22NOV13
117907 117908 Colombia 1700000 COL0101 Non-OECD Colombia 148 3809 10 2.0 8 1996 Female Yes, for one year or less 6.0 NaN Yes, once NaN Three or four times None 1.0 Yes Yes Yes No No Yes <ISCED level 3A> NaN NaN Yes NaN Other (e.g. home duties, retired) <ISCED level 3A> NaN Yes NaN NaN Working full-time <for pay> Country of test Country of test Country of test NaN Language of the test Yes Yes Yes Yes Yes Yes No Yes Yes Yes Yes Yes Yes Yes 170001 170001 170001 Three or more Three or more One One None 11-25 books Agree Agree Agree Agree Agree Agree Agree Agree Agree Disagree Disagree Strongly agree Strongly agree Agree Confident Confident Not very confident Not very confident Not very confident Confident Confident Confident Disagree Disagree Disagree Agree Disagree Strongly agree Agree Disagree Agree Strongly agree Strongly agree Strongly agree Disagree Agree Agree Disagree Slightly likely Likely Likely Not at all likely Slightly likely Likely Agree Agree Agree Agree Agree Agree Agree Agree Agree Courses after school Math Major in college Math Study harder Math Maximum classes Math Pursuing a career Math Sometimes Often Sometimes Sometimes Never or rarely Never or rarely Sometimes Never or rarely NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN Most Lessons Some Lessons Most Lessons Most Lessons Most Lessons Every Lesson Most Lessons Some Lessons Some Lessons Most Lessons Most Lessons Most Lessons Some Lessons Some Lessons Most Lessons Most Lessons Every Lesson Most Lessons Often Sometimes Often Sometimes Sometimes Often Often Always or almost always Always or almost always Some Lessons Some Lessons Some Lessons Never or Hardly Ever Never or Hardly Ever Disagree Disagree Strongly agree Agree Agree Agree Agree Disagree Agree Strongly disagree Agree Strongly agree Strongly agree Strongly disagree Agree Agree Agree Agree Agree Disagree Agree Agree Strongly disagree Agree Disagree Strongly agree Strongly agree Strongly agree Strongly disagree Strongly disagree Agree Strongly agree Agree Strongly agree Strongly agree Agree Strongly agree Strongly agree Disagree Agree Agree Disagree Not at all like me Not at all like me Mostly like me Mostly like me Somewhat like me Mostly like me Very much like me Very much like me Very much like me Mostly like me probably do this probably do this probably not do this probably not do this 1.0 2.0 3.0 2.0 2.0 2.0 2.0 1.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 97 97 97 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN A Scientific calculator 10 10 0 StQ Form B booklet 13 Easier set of booklets 15.75 0.0 Colombia: Secondary education (upper), academic 0.06 1.38 0.0873 0.56 37.83 NaN 1.2923 Colombia Colombia Colombia 0.2435 NaN 0.25 0.49 NaN -0.19 NaN NaN -0.3017 NaN NaN 2.0 ISCED 5A, 6 1.12 NaN ISCED 5A, 6 37.83 -0.36 NaN NaN NaN NaN NaN -0.40 NaN Native NaN NaN NaN 0.05 0.91 A ISCED level 3 General NaN Spanish NaN NaN 0.6426 -0.63 1.4565 0.2882 ISCED 5B NaN -0.2395 Housewife Construction supervisors 1.2387 NaN 15.5 0.9918 Repeated a <grade> 0.88 NaN -0.02 0.6602 0.7644 0.7183 0.4297 -0.28 Spanish NaN NaN NaN -0.68 2.1880 1.2811 1.4655 2.1445 0.8023 0.9986 1.6991 1.1311 0.8270 1.5059 1.0099 1.0863 369.3800 356.1380 399.7585 445.7158 351.4644 412.2215 395.8638 372.4957 406.7690 450.3895 436.3686 397.4217 368.6010 419.2320 394.3060 483.8838 412.2215 420.0109 399.7585 444.1580 396.6428 374.8325 374.8325 395.8638 403.6532 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 447.5183 393.5052 465.7874 464.1988 420.5117 388.4185 405.2033 456.4901 458.3551 436.9079 46.8053 70.2080 25.0625 23.4027 67.5469 70.2080 25.0625 23.4027 23.4027 23.4027 67.5469 25.0625 67.5469 25.0625 67.5469 70.2080 70.2080 70.2080 25.0625 23.4027 67.5469 70.2080 25.0625 23.4027 67.5469 70.2080 25.0625 23.4027 23.4027 23.4027 67.5469 25.0625 67.5469 25.0625 67.5469 70.2080 70.2080 70.2080 25.0625 23.4027 67.5469 70.2080 25.0625 23.4027 67.5469 70.2080 25.0625 23.4027 23.4027 23.4027 67.5469 25.0625 67.5469 25.0625 67.5469 70.2080 70.2080 70.2080 25.0625 23.4027 67.5469 70.2080 25.0625 23.4027 67.5469 70.2080 25.0625 23.4027 23.4027 23.4027 67.5469 25.0625 67.5469 25.0625 67.5469 70.2080 70.2080 70.2080 25.0625 23.4027 67.5469 23 1 0.0836 22NOV13
190097 190098 United Kingdom 8262000 GBR2003 OECD United Kingdom (Scotland) 41 981 11 1.0 10 1996 Male Yes, for one year or less 4.0 No, never No, never No, never Five or more times Three or four times 1.0 Yes NaN Yes NaN NaN NaN <ISCED level 2> No No No No Other (e.g. home duties, retired) <ISCED level 2> No No No No Working part-time <for pay> Country of test Country of test Country of test NaN Language of the test No Yes No No No No No No No Yes No No No Yes 826202 826202 826202 Two Three or more None None One 0-10 books NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN Agree Strongly agree Strongly agree Disagree Disagree Agree Strongly disagree Disagree Disagree Strongly agree NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN by heart check memory learning goals everyday life I do not attend <out-of-school time lessons> i... I do not attend <out-of-school time lessons> i... I do not attend <out-of-school time lessons> i... I do not attend <out-of-school time lessons> i... 1.0 0.0 0.0 0.0 0.0 0.0 Frequently Rarely Rarely Rarely Rarely Sometimes Rarely Never Never Never heard of it Never heard of it Heard of it once or twice Never heard of it Heard of it once or twice Never heard of it Never heard of it Never heard of it Heard of it once or twice Never heard of it Never heard of it Never heard of it Never heard of it Never heard of it Never heard of it Heard of it often 50.0 50.0 50.0 5.0 4.0 8.0 35.0 20.0 Sometimes Sometimes Sometimes Sometimes Never Never Rarely Sometimes Most Lessons Every Lesson Most Lessons Every Lesson Some Lessons Every Lesson Some Lessons Never or Hardly Ever Some Lessons Some Lessons Most Lessons Some Lessons Most Lessons Never or Hardly Ever Some Lessons Most Lessons Every Lesson Most Lessons Often Often Sometimes Often Often Often Often Often Never or rarely Most Lessons Every Lesson Most Lessons Most Lessons Some Lessons Disagree Strongly agree Strongly disagree Agree Strongly agree Agree Agree Disagree Strongly agree Strongly disagree Agree Disagree Agree Strongly agree Agree Agree Agree Agree Strongly agree Disagree Strongly agree Disagree Agree Agree Agree Agree Agree Disagree Strongly disagree Agree Agree Strongly agree Agree Strongly agree Agree Agree Strongly agree Agree Disagree Strongly disagree Agree Disagree NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 97 97 97 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN A Scientific calculator 7 9 2 StQ Form C booklet 10 Standard set of booklets 15.42 0.0 Scotland: Standard Grades or Access 3 or Inter... 0.79 0.24 -0.3397 -0.90 20.91 28.48 -1.0878 United Kingdom (Scotland) United Kingdom (Scotland) United Kingdom (Scotland) -0.0385 NaN -1.51 -1.04 NaN -1.98 -0.2531 -1.8388 NaN -1.7239 0.09 NaN ISCED 2 -2.92 NaN ISCED 2 28.48 -2.87 NaN NaN NaN NaN NaN -3.16 NaN Native NaN NaN NaN NaN NaN C ISCED level 3 General NaN English NaN 250.0 NaN NaN NaN NaN ISCED 2 200.0 0.2486 Shop sales assistants Gardeners, horticultural and nursery growers NaN 1.0 10.0 NaN Did not repeat a <grade> -0.76 400.0 0.45 NaN 0.2509 -0.1057 0.4297 0.11 English NaN NaN NaN -1.91 0.2818 -0.0831 0.0064 -0.0983 0.3769 NaN NaN NaN 0.2669 0.0917 0.3536 NaN 406.7690 450.3895 420.0109 462.0735 427.0213 457.3999 449.6105 427.0213 469.0840 443.3790 407.5479 378.7272 380.2851 412.2215 422.3477 486.9995 475.3155 432.4739 481.5470 475.3155 465.9682 437.9265 435.5897 498.6836 473.7576 381.8430 434.8107 418.4530 391.1902 432.4739 408.3268 479.2102 460.5157 438.7054 484.6627 472.1997 542.3041 502.5783 464.4103 548.5356 423.4087 461.9021 432.2301 461.1002 446.6651 424.7855 444.3678 422.9206 474.2074 463.9500 21.1194 30.9879 31.6791 30.9879 30.9879 30.9879 10.5597 10.7882 31.6791 10.5597 10.7882 31.6791 30.9879 10.5597 10.7882 10.7882 10.7882 31.6791 10.5597 31.6791 10.5597 10.7882 10.5597 10.7882 10.7882 10.7882 31.6791 30.9879 10.5597 31.6791 30.9879 10.5597 10.7882 31.6791 30.9879 30.9879 30.9879 10.5597 31.6791 10.5597 31.6791 30.9879 31.6791 30.9879 30.9879 30.9879 10.5597 10.7882 31.6791 10.5597 10.7882 31.6791 30.9879 10.5597 10.7882 10.7882 10.7882 31.6791 10.5597 31.6791 10.5597 10.7882 10.5597 10.7882 10.7882 10.7882 31.6791 30.9879 10.5597 31.6791 30.9879 10.5597 10.7882 31.6791 30.9879 30.9879 30.9879 10.5597 31.6791 10.5597 31.6791 7 1 0.0307 22NOV13
13199 13200 United Arab Emirates 7840100 ARE0101 Non-OECD United Arab Emirates 334 8457 10 1.0 10 1996 Female Yes, for more than one year 5.0 No, never No, never No, never None None 1.0 Yes Yes Yes Yes No No <ISCED level 3A> NaN Yes NaN NaN Working full-time <for pay> <ISCED level 3A> NaN Yes NaN NaN Working full-time <for pay> Country of test Other country Other country NaN Language of the test Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes No Yes 784001 784002 784002 Three or more One Three or more One One 26-100 books Agree Agree Agree Agree Strongly agree Strongly agree Strongly agree Strongly agree Agree Strongly agree Agree Strongly agree Strongly agree Strongly agree Very confident Very confident Very confident Very confident Very confident Very confident Confident Very confident NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN Strongly agree Strongly agree Strongly disagree Agree Strongly agree Strongly disagree Likely Slightly likely Likely Likely Not at all likely Very Likely Strongly agree Strongly agree Strongly agree Strongly agree Strongly agree Strongly agree Strongly agree Strongly agree Strongly agree Courses after school Math Major in college Science Study harder Math Maximum classes Science Pursuing a career Science Often Always or almost always Always or almost always Always or almost always Always or almost always Sometimes Always or almost always Always or almost always Most important Improve understanding Relating to other subjects Repeat examples I do not attend <out-of-school time lessons> i... I do not attend <out-of-school time lessons> i... I do not attend <out-of-school time lessons> i... I do not attend <out-of-school time lessons> i... 20.0 5.0 0.0 0.0 5.0 20.0 Frequently Frequently Frequently Sometimes Frequently Frequently Frequently Frequently Frequently Know it well, understand the concept Know it well, understand the concept Know it well, understand the concept Know it well, understand the concept Know it well, understand the concept Heard of it once or twice Know it well, understand the concept Know it well, understand the concept Know it well, understand the concept Heard of it a few times Know it well, understand the concept Heard of it a few times Know it well, understand the concept Know it well, understand the concept Know it well, understand the concept Know it well, understand the concept 55.0 55.0 55.0 6.0 8.0 7.0 35.0 28.0 Sometimes Sometimes Frequently Frequently Frequently Frequently Frequently Sometimes NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN Not at all like me Mostly like me Very much like me Very much like me Very much like me Very much like me Very much like me Mostly like me Very much like me Very much like me definitely do this definitely do this definitely do this definitely do this 1.0 1.0 4.0 1.0 1.0 2.0 1.0 3.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 97 97 97 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN A Simple calculator 5 5 0 StQ Form A booklet 24 Easier set of booklets 15.58 0.0 United Arab Emirates: Secondary Education ( up... NaN NaN NaN NaN 82.41 82.41 NaN Another Arabic country (ARE) Another Arabic country (ARE) United Arab Emirates NaN NaN 1.27 NaN NaN 0.85 2.2119 0.7955 0.1524 1.9754 0.33 2.0 ISCED 5A, 6 1.12 NaN ISCED 5A, 6 82.41 -0.10 NaN NaN NaN NaN NaN 1.15 NaN Second-Generation NaN NaN NaN 1.11 1.23 A ISCED level 3 General NaN Arabic NaN 330.0 2.7409 1.47 -0.2514 2.7167 ISCED 5A, 6 440.0 NaN Secondary education teachers Secondary education teachers 1.6493 50.0 16.0 1.3116 Did not repeat a <grade> NaN 385.0 NaN 2.2350 NaN NaN NaN NaN Arabic NaN NaN NaN -0.61 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 481.5470 529.8411 507.2519 526.7253 528.2832 532.1779 483.8838 501.0204 518.9360 494.7889 503.3572 495.5678 524.3885 501.7993 490.1153 532.1779 532.9568 525.9464 522.0517 486.9995 496.3468 459.7367 483.1048 499.4625 443.3790 496.3468 533.7358 536.8515 516.5992 486.9995 526.7253 554.7671 555.5460 530.6200 482.3259 447.2737 494.0100 502.5783 482.3259 445.7158 515.0346 539.6582 493.5883 522.1834 534.8924 619.8620 571.3726 489.3137 550.8579 604.0097 4.3889 6.5834 6.5834 6.5834 6.5834 6.5834 6.5834 6.5834 6.5834 6.5834 6.5834 6.5834 6.5834 6.5834 6.5834 6.5834 6.5834 6.5834 6.5834 6.5834 6.5834 2.1945 2.1945 2.1945 2.1945 2.1945 2.1945 2.1945 2.1945 2.1945 2.1945 2.1945 2.1945 2.1945 2.1945 2.1945 2.1945 2.1945 2.1945 2.1945 2.1945 6.5834 6.5834 6.5834 6.5834 6.5834 6.5834 6.5834 6.5834 6.5834 6.5834 6.5834 6.5834 6.5834 6.5834 6.5834 6.5834 6.5834 6.5834 6.5834 6.5834 2.1945 2.1945 2.1945 2.1945 2.1945 2.1945 2.1945 2.1945 2.1945 2.1945 2.1945 2.1945 2.1945 2.1945 2.1945 2.1945 2.1945 2.1945 2.1945 2.1945 13 1 0.1081 22NOV13
In [7]:
df.describe()
Out[7]:
Unnamed: 0 SUBNATIO SCHOOLID STIDSTD ST01Q01 ST02Q01 ST03Q01 ST03Q02 ST06Q01 ST115Q01 ST21Q01 ST26Q15 ST26Q16 ST26Q17 ST57Q01 ST57Q02 ST57Q03 ST57Q04 ST57Q05 ST57Q06 ST69Q01 ST69Q02 ST69Q03 ST70Q01 ST70Q02 ST70Q03 ST71Q01 ST72Q01 ST101Q01 ST101Q02 ST101Q03 ST101Q05 ST104Q01 ST104Q04 ST104Q05 ST104Q06 IC05Q01 IC06Q01 IC07Q01 EC04Q01A EC04Q01B EC04Q01C EC04Q02A EC04Q02B EC04Q02C EC04Q03A EC04Q03B EC04Q03C EC04Q04A EC04Q04B EC04Q04C EC04Q05A EC04Q05B EC04Q05C EC04Q06A EC04Q06B EC04Q06C CLCUSE301 CLCUSE302 DEFFORT AGE GRADE ANXMAT ATSCHL ATTLNACT BELONG BFMJ2 BMMJ1 CLSMAN COGACT CULTDIST CULTPOS DISCLIMA ENTUSE ESCS EXAPPLM EXPUREM FAILMAT FAMCON FAMCONC FAMSTRUC HEDRES HERITCUL HISEI HOMEPOS HOMSCH HOSTCUL ICTATTNEG ICTATTPOS ICTHOME ICTRES ICTSCH INFOCAR INFOJOB1 INFOJOB2 INSTMOT INTMAT LANGCOMM LANGRPPD LMINS MATBEH MATHEFF MATINTFC MATWKETH MMINS MTSUP OPENPS OUTHOURS PARED PERSEV SCMAT SMINS STUDREL SUBNORM TCHBEHFA TCHBEHSO TCHBEHTD TEACHSUP TIMEINT USEMATH USESCH WEALTH ANCATSCHL ANCATTLNACT ANCBELONG ANCCLSMAN ANCCOGACT ANCINSTMOT ANCINTMAT ANCMATWKETH ANCMTSUP ANCSCMAT ANCSTUDREL ANCSUBNORM PV1MATH PV2MATH PV3MATH PV4MATH PV5MATH PV1MACC PV2MACC PV3MACC PV4MACC PV5MACC PV1MACQ PV2MACQ PV3MACQ PV4MACQ PV5MACQ PV1MACS PV2MACS PV3MACS PV4MACS PV5MACS PV1MACU PV2MACU PV3MACU PV4MACU PV5MACU PV1MAPE PV2MAPE PV3MAPE PV4MAPE PV5MAPE PV1MAPF PV2MAPF PV3MAPF PV4MAPF PV5MAPF PV1MAPI PV2MAPI PV3MAPI PV4MAPI PV5MAPI PV1READ PV2READ PV3READ PV4READ PV5READ PV1SCIE PV2SCIE PV3SCIE PV4SCIE PV5SCIE W_FSTUWT W_FSTR1 W_FSTR2 W_FSTR3 W_FSTR4 W_FSTR5 W_FSTR6 W_FSTR7 W_FSTR8 W_FSTR9 W_FSTR10 W_FSTR11 W_FSTR12 W_FSTR13 W_FSTR14 W_FSTR15 W_FSTR16 W_FSTR17 W_FSTR18 W_FSTR19 W_FSTR20 W_FSTR21 W_FSTR22 W_FSTR23 W_FSTR24 W_FSTR25 W_FSTR26 W_FSTR27 W_FSTR28 W_FSTR29 W_FSTR30 W_FSTR31 W_FSTR32 W_FSTR33 W_FSTR34 W_FSTR35 W_FSTR36 W_FSTR37 W_FSTR38 W_FSTR39 W_FSTR40 W_FSTR41 W_FSTR42 W_FSTR43 W_FSTR44 W_FSTR45 W_FSTR46 W_FSTR47 W_FSTR48 W_FSTR49 W_FSTR50 W_FSTR51 W_FSTR52 W_FSTR53 W_FSTR54 W_FSTR55 W_FSTR56 W_FSTR57 W_FSTR58 W_FSTR59 W_FSTR60 W_FSTR61 W_FSTR62 W_FSTR63 W_FSTR64 W_FSTR65 W_FSTR66 W_FSTR67 W_FSTR68 W_FSTR69 W_FSTR70 W_FSTR71 W_FSTR72 W_FSTR73 W_FSTR74 W_FSTR75 W_FSTR76 W_FSTR77 W_FSTR78 W_FSTR79 W_FSTR80 WVARSTRR VAR_UNIT SENWGT_STU
count 485490.000000 4.854900e+05 485490.000000 485490.000000 485490.000000 485438.000000 485490.000000 485490.000000 457994.000000 479269.000000 32728.000000 4.854900e+05 4.854900e+05 4.854900e+05 301367.000000 269808.000000 283813.000000 279657.000000 289502.000000 289428.000000 299618.000000 298601.000000 291943.000000 296878.000000 298339.000000 289068.000000 255665.000000 294163.000000 311290.000000 310906.000000 310321.00000 310655.000000 310449.000000 309969.000000 310366.000000 310156.000000 485490.000000 485490.000000 485490.000000 169730.000000 169765.000000 169779.000000 169783.000000 169784.000000 169798.000000 169796.000000 169786.000000 169799.000000 169655.000000 169641.00000 169656.000000 169716.000000 169716.000000 169725.000000 169643.000000 169640.000000 169636.000000 485490.000000 485490.000000 485490.000000 485374.000000 484617.000000 314764.000000 312584.000000 311675.000000 313399.000000 416150.000000 364814.000000 312708.000000 314557.000000 13380.000000 471357.000000 314777.000000 295195.000000 473648.000000 313279.000000 312602.000000 314448.000000 310304.000000 308442.000000 429058.000000 477772.000000 13496.000000 450621.000000 479807.000000 293194.000000 13598.000000 289744.000000 290490.000000 298740.000000 477754.000000 297995.000000 165792.000000 83305.000000 83305.000000 316322.000000 316708.000000 44094.000000 43137.000000 282866.000000 313847.000000 315948.000000 301360.000000 314501.000000 283303.000000 313599.000000 312766.000000 308799.000000 473091.000000 313172.000000 314607.000000 270914.000000 313860.000000 316323.000000 314678.000000 315114.000000 315519.000000 316371.000000 297074.000000 290260.000000 292585.000000 479597.00000 306835.000000 306487.000000 307640.000000 308467.000000 308150.000000 155221.000000 155280.000000 153879.000000 308631.000000 306948.000000 308058.000000 155233.000000 485490.000000 485490.000000 485490.000000 485490.000000 485490.000000 473031.000000 473031.000000 473031.000000 473031.000000 473031.000000 473031.000000 473031.000000 473031.000000 473031.000000 473031.000000 473031.000000 473031.000000 473031.000000 473031.000000 473031.000000 473031.000000 473031.000000 473031.000000 473031.000000 473031.000000 471439.000000 471439.000000 471439.000000 471439.000000 471439.000000 471439.000000 471439.000000 471439.000000 471439.000000 471439.000000 471439.000000 471439.000000 471439.000000 471439.000000 471439.000000 485490.000000 485490.000000 485490.000000 485490.000000 485490.000000 485490.000000 485490.000000 485490.000000 485490.00000 485490.000000 485490.000000 485490.000000 485490.000000 485490.000000 485490.000000 485490.000000 485490.000000 485490.000000 485490.000000 485490.000000 485490.000000 485490.000000 485490.000000 485490.000000 485490.000000 485490.000000 485490.000000 485490.000000 485490.000000 485490.000000 485490.000000 485490.000000 485490.000000 485490.000000 485490.000000 485490.000000 485490.000000 485490.000000 485490.000000 485490.000000 485490.000000 485490.000000 485490.000000 485490.000000 485490.000000 485490.000000 485490.000000 485490.000000 485490.000000 485490.000000 485490.000000 485490.000000 485490.000000 485490.000000 485490.000000 485490.000000 485490.000000 485490.000000 485490.000000 485490.000000 485490.000000 485490.000000 485490.000000 485490.000000 485490.000000 485490.000000 485490.000000 485490.000000 485490.000000 485490.000000 485490.000000 485490.000000 485490.000000 485490.000000 485490.000000 485490.000000 485490.000000 485490.000000 485490.000000 485490.000000 485490.000000 485490.00000 485490.000000 485490.000000 485490.000000 485490.000000 485490.000000 485490.000000 485490.000000 485490.000000 485490.000000 485490.000000 485490.000000 485490.000000
mean 242745.500000 4.315457e+06 240.152197 6134.066201 9.813323 2.579260 6.558512 1996.070061 6.148963 1.265356 6.481117 7.103064e+05 7.268785e+05 8.130810e+05 5.493770 1.623629 0.954044 0.911821 1.213363 1.583081 52.744331 52.911273 52.722812 4.228060 4.350246 4.047826 31.138885 26.017759 1.831800 2.026944 2.83132 2.180995 1.879072 2.834719 1.963050 1.930719 39.524894 40.569633 41.019310 1.648448 1.462251 1.850506 1.755941 1.470916 1.761770 1.641782 1.698255 1.663773 1.737544 1.68622 1.583711 1.666631 1.568732 1.745824 1.777203 1.717301 1.517467 18.799584 19.917745 13.867758 15.784283 -0.162964 0.152647 0.051644 0.041384 -0.022259 42.423367 44.408617 0.083241 0.095883 -0.075938 -0.041828 -0.002501 -0.071999 -0.265546 -0.003461 -0.012033 -0.013110 0.116383 -0.100571 1.889355 -0.195442 -0.011065 48.923298 -0.324815 0.005306 0.028976 0.069838 0.023047 -0.100623 -0.351888 -0.065411 -0.094665 -0.072453 -0.009537 0.108456 0.212424 1.929605 0.984004 219.276636 0.241209 -0.046626 -0.012782 0.135775 226.007056 0.177720 0.038895 11.104100 12.995225 0.140125 0.035656 211.122460 0.123262 0.166138 0.137930 0.209052 0.147423 0.152789 50.895996 0.067043 -0.025161 -0.33701 -0.033189 -0.045223 -0.082717 -0.017825 -0.008190 0.021863 0.078194 0.010586 0.029555 -0.048654 -0.009193 0.008603 469.621653 469.648358 469.648930 469.641832 469.695396 465.911005 465.722088 465.854924 465.754630 465.801144 468.575096 468.433359 468.529638 468.456902 468.479663 468.363294 468.216835 468.335575 468.217500 468.309018 468.512152 468.359786 468.458271 468.444531 468.432745 470.534430 470.549065 470.600776 470.552599 470.600025 467.799726 467.894331 467.937485 467.907263 467.873554 472.380257 472.430514 472.473404 472.416146 472.439168 472.004640 472.068052 472.022059 471.926562 472.013506 475.769824 475.813674 475.851549 475.78524 475.820184 50.919113 51.055873 50.871677 50.881269 50.827798 50.917561 51.021655 50.922471 51.005555 50.836181 50.664303 50.728190 50.904351 50.944928 50.883711 50.880730 50.967077 51.161937 51.010651 50.930648 51.054811 50.922975 50.939376 50.848867 51.135859 51.063127 51.310100 51.099649 51.241671 50.631111 50.841715 51.184395 50.884403 51.021236 50.934120 50.960809 51.258724 50.996567 50.967408 50.655822 50.822744 51.009717 50.596088 50.740495 50.641955 51.115232 51.153540 51.242228 50.739218 51.375184 51.081902 50.974541 51.117503 50.960259 50.990849 51.115101 50.806238 50.598170 50.928356 51.207395 50.816763 50.983417 50.498705 51.227296 50.931417 50.702373 50.660514 51.274698 50.898586 50.964887 51.085841 50.85639 50.716749 50.709636 50.844201 51.020378 50.943149 50.685275 51.019842 50.540724 50.721164 40.013920 1.531189 0.140054
std 140149.035431 2.524434e+06 278.563016 6733.144944 3.734726 2.694013 3.705244 0.255250 0.970693 0.578992 4.579245 1.583832e+06 1.629829e+06 1.811846e+06 5.383815 2.591569 2.162574 2.362377 2.353292 2.760885 16.903873 17.007616 16.635498 1.652415 1.652565 2.539119 9.090506 9.223134 0.891414 0.914075 0.97202 0.909405 0.805788 0.998822 0.885696 0.854910 46.390983 45.463542 45.306261 0.477456 0.498574 0.356576 0.429529 0.499155 0.426002 0.479478 0.459016 0.472419 0.439971 0.46403 0.492944 0.471418 0.495255 0.435398 0.416125 0.450312 0.499696 29.807876 29.078596 32.474850 0.290221 0.655558 0.955031 1.002942 0.997704 0.983503 21.622126 22.018510 0.990321 1.012342 1.008322 1.001965 0.993017 1.054459 1.131791 1.011031 0.987126 1.029037 1.063550 1.024666 0.385621 1.074053 0.994293 22.120953 1.163213 1.012673 0.993405 0.997250 0.987521 1.076591 1.214732 1.048941 0.998131 0.993144 1.000565 0.983542 1.004716 1.985201 1.534719 97.997730 1.054971 0.973588 0.997417 1.009700 97.448421 1.011025 0.998720 10.476669 3.398623 0.996012 0.955625 131.368322 1.029343 1.088985 1.027669 1.045459 1.051583 0.995688 40.987895 1.031781 1.007925 1.21530 1.005530 1.020514 1.005093 1.003064 0.994403 0.995180 1.008570 1.002834 1.018915 0.985763 1.007970 1.014749 103.265391 103.382077 103.407631 103.392286 103.419170 115.051125 115.100645 115.169693 115.136083 115.215380 109.952388 109.999454 110.048931 110.070679 110.113359 111.466289 111.396856 111.448251 111.470898 111.510498 101.337279 101.317089 101.400741 101.392814 101.425391 103.390231 103.364696 103.423160 103.401790 103.311354 113.838415 113.816835 113.835473 113.804649 113.763119 106.488425 106.413471 106.446674 106.452466 106.416636 102.505523 102.626198 102.640489 102.576066 102.659989 101.464426 101.514649 101.495072 101.51220 101.566347 107.382092 121.604868 123.186458 123.227436 122.419105 119.930870 124.738867 120.194806 123.972494 119.294325 119.412749 120.967797 119.825413 122.878534 124.748298 121.087659 123.660376 125.650336 120.683899 120.353115 124.798996 123.827474 122.493030 124.852544 125.135890 123.522659 125.903997 122.150814 127.307653 119.150566 120.745367 123.640970 120.748469 124.759321 121.965014 122.733971 126.769255 124.332223 122.566333 119.857651 124.417727 124.249714 117.967963 120.307004 119.419120 124.966911 123.245513 126.001133 122.298004 126.406935 126.330021 124.900210 126.135726 121.599760 122.039390 125.834607 122.886347 118.821342 124.521334 126.775479 121.704072 123.365810 117.352351 122.339422 121.259217 121.610738 120.271138 124.746978 122.512868 124.035949 124.301710 122.76611 121.528789 119.623395 120.684726 122.946533 121.170883 119.267686 122.981541 119.479516 119.799018 22.951264 0.539759 0.137864
min 1.000000 8.000000e+04 1.000000 1.000000 7.000000 1.000000 1.000000 1996.000000 4.000000 1.000000 0.000000 8.001000e+03 8.001000e+03 8.001000e+03 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 15.000000 15.000000 15.000000 0.000000 0.000000 0.000000 1.000000 0.000000 1.000000 1.000000 1.00000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.00000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 -9.000000 15.170000 -3.000000 -2.370000 -2.990000 -3.375800 -3.690000 11.010000 11.010000 -3.253000 -3.884000 -2.721000 -1.510000 -2.480000 -3.974900 -5.950000 -2.986800 -2.733400 -3.766600 -4.572300 -7.230000 1.000000 -3.930000 -2.731100 11.010000 -6.880000 -2.444200 -2.811300 -2.157500 -2.899900 -4.017800 -3.160000 -2.803800 -2.307400 -1.539200 -1.787200 -2.300000 -1.780000 0.000000 0.000000 0.000000 -2.140200 -3.750000 -1.532900 -3.450300 0.000000 -2.864500 -3.633300 0.000000 3.000000 -4.053000 -2.180000 0.000000 -3.110000 -4.245600 -2.391900 -1.599900 -3.653100 -2.920000 0.000000 -0.774900 -1.610400 -6.65000 -2.777900 -2.609000 -3.373100 -2.822400 -3.067400 -2.217200 -1.695800 -2.743000 -2.749200 -2.013400 -2.665800 -3.027700 19.792800 6.473000 42.226200 24.622200 37.085200 6.550900 1.098300 2.812000 0.241500 1.332000 3.357200 9.588700 1.254100 6.395100 5.070900 1.254100 21.194900 35.215800 9.588700 2.033000 7.953000 2.578300 3.435100 11.536100 0.942500 1.020400 0.085700 3.746700 0.085700 11.536100 6.473000 0.163600 2.967800 0.319400 0.942500 2.578300 3.357200 1.955100 2.578300 5.849800 0.083400 0.703500 0.703500 4.134400 2.307400 2.648300 2.834800 11.879900 8.42970 17.754600 1.000000 0.500000 0.292900 0.292900 0.292900 0.292900 0.292900 0.292900 0.292900 0.292900 0.292900 0.292900 0.292900 0.292900 0.292900 0.292900 0.292900 0.292900 0.292900 0.292900 0.292900 0.292900 0.292900 0.292900 0.292900 0.292900 0.292900 0.292900 0.292900 0.292900 0.292900 0.292900 0.292900 0.292900 0.292900 0.292900 0.292900 0.292900 0.292900 0.292900 0.292900 0.292900 0.292900 0.292900 0.292900 0.292900 0.292900 0.292900 0.292900 0.292900 0.292900 0.292900 0.292900 0.292900 0.292900 0.292900 0.292900 0.292900 0.292900 0.292900 0.292900 0.292900 0.292900 0.292900 0.292900 0.292900 0.292900 0.292900 0.292900 0.292900 0.292900 0.29290 0.292900 0.292900 0.292900 0.292900 0.292900 0.292900 0.292900 0.292900 0.292900 1.000000 1.000000 0.000500
25% 121373.250000 2.030000e+06 61.000000 1811.000000 9.000000 1.000000 4.000000 1996.000000 6.000000 1.000000 2.000000 2.030020e+05 2.080010e+05 2.080010e+05 2.000000 0.000000 0.000000 0.000000 0.000000 0.000000 45.000000 45.000000 45.000000 3.000000 3.000000 2.000000 28.000000 20.000000 1.000000 1.000000 2.00000 2.000000 1.000000 2.000000 1.000000 1.000000 2.000000 4.000000 4.000000 1.000000 1.000000 2.000000 2.000000 1.000000 2.000000 1.000000 1.000000 1.000000 1.000000 1.00000 1.000000 1.000000 1.000000 1.000000 2.000000 1.000000 1.000000 7.000000 9.000000 0.000000 15.580000 0.000000 -0.470000 -0.640000 -0.939400 -0.740000 25.390000 25.040000 -0.449900 -0.459300 -0.790100 -0.480000 -0.710000 -0.547900 -1.000000 -0.437100 -0.663500 -0.530000 -0.427500 -0.800000 2.000000 -0.690000 -0.753400 28.520000 -0.980000 -0.447700 -0.522100 -0.148900 -0.509600 -0.689100 -1.130000 -0.753800 -0.695500 -0.774000 -0.636000 -0.670000 -0.340000 0.000000 0.000000 165.000000 -0.456700 -0.630000 -0.733200 -0.401700 180.000000 -0.657700 -0.543300 4.000000 12.000000 -0.340700 -0.520000 120.000000 -0.480000 -0.385200 -0.594500 -0.580900 -0.561200 -0.470000 19.000000 -0.774900 -0.785400 -1.04000 -0.411000 -0.593700 -0.420600 -0.379800 -0.248900 -0.424300 -0.350200 -0.255800 -0.367100 -0.416200 -0.387000 -0.198200 395.318600 395.318600 395.240700 395.396500 395.240700 385.893400 385.737700 385.815500 385.659800 385.815500 391.190200 391.190200 391.190200 391.190200 390.956500 389.710200 389.710200 389.632300 389.632300 389.632300 396.876500 396.720700 396.720700 396.720700 396.642800 397.499600 397.499600 397.499600 397.577500 397.499600 386.750300 387.295500 387.295500 387.295500 387.295500 397.188000 397.110100 397.421700 397.421700 397.188000 403.600700 403.360100 403.360100 403.354600 403.360100 404.457300 404.457300 404.550500 404.45730 404.457300 6.386300 4.682200 4.656000 4.620500 4.645100 4.658700 4.655100 4.671300 4.652500 4.653600 4.660300 4.648100 4.656000 4.616900 4.652500 4.679300 4.650550 4.681500 4.642000 4.645100 4.672700 4.645600 4.649175 4.654300 4.670225 4.661975 4.672700 4.649900 4.653600 4.657025 4.656425 4.651900 4.656400 4.643100 4.665100 4.631800 4.638700 4.647400 4.645600 4.638125 4.645100 4.644100 4.652400 4.665925 4.631800 4.654300 4.679300 4.682700 4.666700 4.682200 4.657200 4.666700 4.656500 4.657400 4.678700 4.631600 4.666800 4.667500 4.657400 4.662700 4.628600 4.625100 4.644500 4.657400 4.649900 4.655800 4.631800 4.640900 4.684000 4.666400 4.642100 4.65870 4.670200 4.670100 4.660300 4.664800 4.643100 4.667000 4.675200 4.651850 4.660300 20.000000 1.000000 0.037800
50% 242745.500000 4.100000e+06 136.000000 3740.000000 10.000000 1.000000 7.000000 1996.000000 6.000000 1.000000 6.000000 4.400010e+05 4.400010e+05 4.400010e+05 4.000000 1.000000 0.000000 0.000000 0.000000 1.000000 50.000000 50.000000 50.000000 4.000000 4.000000 4.000000 32.000000 25.000000 2.000000 2.000000 3.00000 2.000000 2.000000 3.000000 2.000000 2.000000 3.000000 5.000000 6.000000 2.000000 1.000000 2.000000 2.000000 1.000000 2.000000 2.000000 2.000000 2.000000 2.000000 2.00000 2.000000 2.000000 2.000000 2.000000 2.000000 2.000000 2.000000 8.000000 10.000000 2.000000 15.750000 0.000000 0.060000 -0.240000 0.087300 -0.150000 34.250000 43.330000 -0.078400 0.101500 -0.051100 0.250000 -0.080000 -0.001800 -0.190000 -0.068100 0.647400 -0.076000 0.082000 -0.220000 2.000000 0.040000 0.015900 48.820000 -0.260000 0.052600 -0.087400 -0.148900 0.027900 -0.087200 -0.400000 -0.083600 -0.058600 -0.226800 0.084900 0.050000 0.300000 1.000000 0.000000 200.000000 0.217100 -0.180000 -0.138100 0.109900 220.000000 0.248600 0.052100 8.000000 13.000000 0.051100 -0.060000 180.000000 -0.020000 -0.045500 0.250900 0.221700 0.167200 0.110000 39.000000 -0.774900 0.058400 -0.30000 0.052600 0.030900 0.050100 0.050500 0.173100 0.035800 0.114700 0.144900 0.154100 0.035800 0.080900 0.178600 466.201900 466.124000 466.201900 466.279800 466.435600 464.955600 464.721900 465.189300 465.189300 464.721900 467.993500 467.837700 467.837700 467.837700 467.759800 462.307200 462.229300 462.540900 462.385100 462.540900 464.488200 464.410300 464.410300 464.488200 464.332500 469.862900 469.629200 469.862900 469.862900 469.940800 463.008300 462.930400 463.008300 462.930400 462.930400 469.785000 469.862900 469.862900 469.629200 469.862900 475.455000 475.535200 475.455000 475.535200 475.535200 475.699400 475.606100 475.699400 475.97910 475.885900 15.782900 13.714300 13.543300 13.663900 13.652800 13.674850 13.668400 13.644300 13.704100 13.637700 13.714300 13.598500 13.648800 13.682400 13.702250 13.604100 13.617500 13.600000 13.691500 13.620300 13.622700 13.679700 13.750700 13.689100 13.576900 13.672800 13.654800 13.572200 13.624800 13.650900 13.585000 13.626800 13.734000 13.672800 13.654200 13.739100 13.724100 13.686300 13.677100 13.757800 13.627000 13.682550 13.619400 13.706500 13.669200 13.663900 13.650900 13.632100 13.655700 13.598850 13.637700 13.644200 13.597000 13.617700 13.739100 13.681000 13.714400 13.670500 13.674800 13.607700 13.681000 13.710700 13.614400 13.711300 13.632100 13.777500 13.711500 13.672800 13.627000 13.707700 13.722600 13.59420 13.714300 13.668400 13.637700 13.698900 13.611700 13.672100 13.731100 13.582000 13.600200 40.000000 2.000000 0.145200
75% 364117.750000 6.880000e+06 291.000000 7456.000000 10.000000 3.000000 9.000000 1996.000000 7.000000 1.000000 10.000000 7.030020e+05 7.030020e+05 7.040020e+05 7.000000 2.000000 1.000000 1.000000 2.000000 2.000000 55.000000 55.000000 55.000000 5.000000 5.000000 5.000000 35.000000 30.000000 2.000000 3.000000 4.00000 3.000000 2.000000 4.000000 2.000000 2.000000 97.000000 97.000000 97.000000 2.000000 2.000000 2.000000 2.000000 2.000000 2.000000 2.000000 2.000000 2.000000 2.000000 2.00000 2.000000 2.000000 2.000000 2.000000 2.000000 2.000000 2.000000 10.000000 10.000000 3.000000 16.000000 0.000000 0.790000 0.770000 1.211500 0.560000 60.920000 65.010000 0.764000 0.540300 0.353500 1.270000 0.810000 0.454600 0.610000 0.535900 0.795500 0.640000 0.638700 0.500000 2.000000 1.120000 0.719400 70.340000 0.390000 0.617300 1.192100 0.700500 1.304500 0.416000 0.240000 0.542300 0.602500 0.696400 0.714500 0.800000 0.910000 4.000000 2.000000 250.000000 0.811000 0.540000 0.658400 0.649000 250.000000 1.116900 0.463900 14.000000 16.000000 0.479500 0.650000 270.000000 0.810000 0.660200 0.764400 0.718300 0.722800 0.970000 71.000000 0.869500 0.669800 0.43000 0.441000 0.490800 0.466200 0.484000 0.513300 0.556000 0.792500 0.504600 0.583100 0.557000 0.475900 0.547700 541.057800 541.447300 541.291500 541.447300 541.447300 545.419800 545.342000 545.497700 545.419800 545.419800 546.276700 546.276700 546.276700 546.276700 546.276700 541.992500 541.836700 542.226200 542.226200 542.226200 538.409400 538.331500 538.409400 538.487300 538.409400 543.316700 543.394600 543.550400 543.238800 543.550400 544.563000 544.718800 544.718800 544.640900 544.718800 546.899800 546.744000 546.899800 546.899800 546.977700 544.502500 544.503500 544.503500 544.502500 544.503500 547.780700 547.873900 547.967200 547.78070 547.780700 44.473300 40.893700 41.314000 41.358000 41.422500 41.418700 41.179900 41.316100 41.650100 41.245500 41.313500 41.306000 41.072600 41.288725 41.406600 41.250200 41.144000 41.202000 41.559500 41.140000 40.860700 41.179300 41.284800 41.332500 40.995900 41.202000 40.934000 41.059800 41.150100 40.999600 41.183150 41.383200 41.047400 41.145000 41.184075 41.238400 41.092800 41.245900 41.139400 40.934000 41.590400 41.262700 41.229800 41.233500 41.203700 41.471100 41.003100 41.265200 41.371800 41.165000 41.018300 41.276800 41.579000 41.210500 40.896200 41.202500 41.014300 41.123800 41.682700 41.519800 41.262700 40.896200 41.233500 41.101400 41.358000 41.097300 41.119900 41.085200 40.893700 41.043100 41.359900 41.19880 41.157100 41.056600 41.233500 41.512500 41.695200 41.097300 41.189600 41.290925 41.356000 60.000000 2.000000 0.199900
max 485490.000000 8.580000e+06 1471.000000 33806.000000 96.000000 25.000000 99.000000 1997.000000 16.000000 4.000000 16.000000 9.999999e+06 9.999999e+06 9.999999e+06 30.000000 30.000000 30.000000 30.000000 30.000000 30.000000 180.000000 180.000000 180.000000 40.000000 40.000000 40.000000 200.000000 200.000000 4.000000 4.000000 4.00000 4.000000 4.000000 4.000000 4.000000 4.000000 99.000000 99.000000 99.000000 2.000000 2.000000 2.000000 2.000000 2.000000 2.000000 2.000000 2.000000 2.000000 2.000000 2.00000 2.000000 2.000000 2.000000 2.000000 2.000000 2.000000 2.000000 99.000000 99.000000 99.000000 16.330000 3.000000 2.550000 2.350000 1.211500 2.630000 88.960000 88.960000 2.198900 3.201900 1.535500 1.270000 1.850000 4.431900 3.690000 3.203900 0.795500 3.906700 4.444100 5.180000 3.000000 1.120000 1.305000 88.960000 4.150000 3.733200 1.192100 2.408300 1.304500 2.783300 1.150000 2.826100 2.784600 1.701000 1.337800 1.590000 2.290000 5.000000 4.000000 2400.000000 4.424900 2.270000 1.456500 2.716700 3000.000000 1.843300 2.446500 180.000000 18.000000 3.528600 2.260000 2975.000000 2.160000 3.858700 2.629500 3.310800 2.563000 1.680000 206.000000 2.801100 4.109900 3.25000 3.042300 2.000900 3.255400 2.973400 3.460300 2.530800 2.998400 2.970700 2.637900 2.807300 2.821100 3.426000 962.229300 957.010400 935.745400 943.456900 907.625800 980.845900 990.193200 979.443800 981.936400 988.167900 934.888600 940.341200 977.107000 954.517800 929.436000 1082.107800 1107.812700 1086.781400 1022.129600 1113.265300 941.120100 947.351600 920.867700 917.050900 916.973000 895.162800 895.240700 932.084400 957.477800 888.230300 1009.666600 1018.312800 1100.334900 1098.465500 1039.422000 918.297200 934.888600 909.261600 974.614400 941.353800 904.802600 881.239200 884.447000 881.159000 901.608600 903.338300 900.540800 867.624000 926.55730 880.958600 2597.884400 3049.111000 4602.289700 4348.013600 4526.703400 3466.265300 4228.624900 3644.604800 4067.168700 3049.111000 3049.111000 3429.354200 3181.109000 3840.874600 5096.801400 3847.578800 3840.874600 4769.874900 3319.235200 3340.339800 3840.874600 3049.111000 3625.434500 3737.661300 3292.994700 3743.450100 3820.647300 3961.382400 3689.182300 3049.111000 3177.984600 3703.587300 3457.601000 3904.868100 3292.994700 4155.283000 4231.172200 3239.977000 3565.008900 3607.478300 3904.868100 4158.880200 3073.201300 3080.276200 2590.724000 4440.492600 2789.766100 4067.168700 3292.118800 3840.874600 4602.289700 4348.013600 4769.874900 3189.080500 2657.458400 4228.624900 3466.265300 2994.771900 3840.874600 4769.874900 3139.290800 3625.434500 3340.339800 3326.616500 2782.554900 3743.450100 2996.352000 3961.382400 3535.884600 4231.172200 3625.434500 3737.66130 3457.601000 3466.265300 2476.566800 4155.283000 3743.450100 3232.163700 3904.868100 3607.478300 3412.174100 80.000000 3.000000 5.095500
In [8]:
df_free.sample(5)
Out[8]:
country SUBNATIO political_rights civil_liberties freedom_status
64 Spain 7241000 1 1 F
17 Denmark 2080000 1 1 F
45 Peru 6040000 2 3 F
2 Argentina 320100 2 2 F
21 Germany 2760000 1 1 F
In [9]:
df_free.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 86 entries, 0 to 85
Data columns (total 5 columns):
country             86 non-null object
SUBNATIO            86 non-null int64
political_rights    86 non-null int64
civil_liberties     86 non-null int64
freedom_status      86 non-null object
dtypes: int64(3), object(2)
memory usage: 3.4+ KB

What is the structure of your dataset?

The set has 485,490 rows representing individual students in the survey. There are 636 columns that contain data about lots of different factors about the student, there they live, their economic, social, and cultural status, their attitudes about learning, and their performance on a panel of math, reading, and science assessments. Many of the qualitative columns contain contain string values that represent a scale (e.g. "Never" -> "Every day"). There's a wealth of data here!

What is/are the main feature(s) of interest in your dataset?

The features of interest from the PISA dataset to me are primarily:

  • NC - country code (there is another code called CNT but that seems to include sub-regions, such as Florida and Massachusetts)
  • ST04Q01 - the student's gender (Male or Female)
  • PV1MATH-PV5MATH - plausible values that represent scores on the mathematic section
  • PV1READ-PV5READ - plausible values that represent scores on the reading section
  • PV1SCIE-PV5SCIE - plausible values that represent scores on the science section
  • EC06Q01 - the year the student began learning the language the test was given in
  • ESCS - a single index that combines several economic, social and cultural status factors. It's expressed as +/- standard deviations from the mean of population (where 0 = mean). The index of ESCS was used first in the PISA 2000 analysis and at that time was derived from five indices: highest occupational status of parents (HISEI), highest educational level of parents (in years of education according to ISCED), family wealth, cultural possessions and home educational resources (all three WLEs based on student reports on home possessions). The ESCS scores were obtained as component scores for the first principal component with zero being the score of an average OECD student and one being the standard deviation across equally weighted OECD countries.

And from the Freedom House dataset I'll be using:

  • political_rights - degree of a country's political freedom, scale of 1=most free to 7=least free
  • civil_liberties - degree of a country's civil liberties, scale of 1=most free to 7=least free
  • freedom_status - a single overall rating of the country's freedom, F, PF, and NF stand for Free, Partly Free, and Not Free

What features in the dataset do you think will help support your investigation into your feature(s) of interest?

For my assessment, I'll be using ESCS to represent student economic, social, and cultural status. I'll be creating new values to average each of the 5 math, reading, and science tests into single scores for each subject. I'll also be looking at these factors against the Freedom House freedom scores.

Cleaning

In [10]:
# Make a copy for cleaning
df_clean = df.copy()

I noticed several of the string values leading or trailing spaces. The NC (country) column is the only affected one I'll be using.

In [11]:
# Trim trailing/leading spaces from the country names
df_clean['NC'] = df_clean['NC'].apply(lambda x: x.strip())

Freedom House data

Now I'm going to join the Freedom House data

In [12]:
# Join the Freedom House data, drop the dupe country column
df_clean = pd.merge(df_clean, df_free, left_on='SUBNATIO', right_on='SUBNATIO')
df_clean.drop(columns=['country'], inplace=True)
In [13]:
df_clean['civil_liberties'].corr(df_clean['political_rights'])
Out[13]:
0.9321221136849408

The civil_liberties and political_rights values are very highly positively correlated, so I'm only going to use civil_liberties in my analysis. A value that reflects individual freedom is most relevant to individual academic performance.

Math/Reading/Science/Overall Literacy Scores

From the PISA Data Visualization Contest instructions: "Pupil performance in mathematics, reading and science is coded by plausible values. You can find them in columns: PV1MATH-PV5MATH (for math), PV1READ-PV5READ (for reading) and PV1SCIE-PV5SCIE (for science). For given area all five values PV1-PV5 are just independent estimations of the student performance in given area. For exploration it is fine to use only PV1."

Based on this, I've decided to use the average of each of the five scores in each category as an overall "literacy" value. I then compute the average of all questions to stand as an overall literacy score.

In [14]:
# Each score is scaled to a mean of 500 and a standard deviation of 100

# Average of the five MATH assessment scores
df_clean['math_literacy'] = (
    (df_clean['PV1MATH'] + df_clean['PV2MATH'] + df_clean['PV3MATH'] +
     df_clean['PV4MATH'] + df_clean['PV5MATH']) / 5)

# Average of the five READING assessment scores
df_clean['read_literacy'] = (
    (df_clean['PV1READ'] + df_clean['PV2READ'] + df_clean['PV3READ'] +
     df_clean['PV4READ'] + df_clean['PV5READ']) / 5)

# Average of the five SCIENCE assessment scores
df_clean['sci_literacy'] = (
    (df_clean['PV1SCIE'] + df_clean['PV2SCIE'] + df_clean['PV3SCIE'] +
     df_clean['PV4SCIE'] + df_clean['PV5SCIE']) / 5)

# Average of all assessment scores
df_clean['overall_literacy'] = (
    (df_clean['PV1MATH'] + df_clean['PV2MATH'] + df_clean['PV3MATH'] +
     df_clean['PV4MATH'] + df_clean['PV5MATH'] + df_clean['PV1READ'] +
     df_clean['PV2READ'] + df_clean['PV3READ'] + df_clean['PV4READ'] +
     df_clean['PV5READ'] + df_clean['PV1SCIE'] + df_clean['PV2SCIE'] +
     df_clean['PV3SCIE'] + df_clean['PV4SCIE'] + df_clean['PV5SCIE']) / 15)

Age Started Learning Language

In [15]:
# EC06Q01 = age started learning
df_clean.EC06Q01.unique()
Out[15]:
array([nan, '0 to 3 years', '4 to 6 years', '10 to 12 years',
       '7 to 9 years', '13 years or older'], dtype=object)
In [16]:
# Convert the EC06Q01 field to an ordered dtype
ordered_var = pd.api.types.CategoricalDtype(ordered=True,
                                            categories=[
                                                '0 to 3 years', '4 to 6 years',
                                                '7 to 9 years',
                                                '10 to 12 years',
                                                '13 years or older'
                                            ])

df_clean['EC06Q01'] = df_clean['EC06Q01'].astype(ordered_var)
In [17]:
# Test that the ordered dict dtype worked
df_clean['EC06Q01'].unique()
Out[17]:
[NaN, 0 to 3 years, 4 to 6 years, 10 to 12 years, 7 to 9 years, 13 years or older]
Categories (5, object): [0 to 3 years < 4 to 6 years < 7 to 9 years < 10 to 12 years < 13 years or older]

Advantaged/Disadvantaged Status

I'm considering a student to be "disadvantaged" if their ESCS score is -1 or more stdevs below the mean and advantaged if their scores is +1 stdev or more above the mean.

In [18]:
df_clean['disadvantaged'] = df_clean['ESCS'].dropna().apply(lambda x: 1
                                                            if x <= -1 else 0)

df_clean['advantaged'] = df_clean['ESCS'].dropna().apply(lambda x: 1
                                                            if x >= 1 else 0)
In [19]:
# Test new columns
# Expect: -1.0
print(df_clean.query('disadvantaged == 1')['ESCS'].max())

# Expect: 1.0
print(df_clean.query('advantaged == 1')['ESCS'].min())
-1.0
1.0

Remove unused columns

Our dataframe is massive which can make it unwieldly to analyze. We're going to reduce it to only those columns we'll be using in our analysis.

In [20]:
df_clean = df_clean[[
    'STIDSTD', 'NC', 'ESCS', 'EC06Q01', 'ST04Q01', 'overall_literacy',
    'math_literacy', 'read_literacy', 'sci_literacy', 'civil_liberties',
    'disadvantaged', 'advantaged'
]]
In [21]:
df_clean.rename(columns={
    'STIDSTD': 'student_id',
    'NC': 'country',
    'EC06Q01': 'age_start_learn',
    'ST04Q01': 'gender'
},
                inplace=True)
In [22]:
df_clean.sample(5)
Out[22]:
student_id country ESCS age_start_learn gender overall_literacy math_literacy read_literacy sci_literacy civil_liberties disadvantaged advantaged
385498 1970 Portugal -1.51 NaN Female 472.980340 465.96820 486.59834 466.37448 1 1.0 0.0
335290 19588 Mexico -1.05 NaN Male 496.585967 504.68140 493.33834 491.73816 3 1.0 0.0
465832 443 Turkey -2.58 NaN Male 421.732353 398.66800 456.84984 409.67922 4 1.0 0.0
168521 10216 Spain -1.94 NaN Female 401.054193 373.58624 416.54016 413.03618 1 1.0 0.0
439381 1698 Slovenia -0.94 NaN Male 365.863327 375.45566 327.09514 395.03918 1 0.0 0.0
In [23]:
df_clean.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 480531 entries, 0 to 480530
Data columns (total 12 columns):
student_id          480531 non-null int64
country             480531 non-null object
ESCS                468689 non-null float64
age_start_learn     40345 non-null category
gender              480531 non-null object
overall_literacy    480531 non-null float64
math_literacy       480531 non-null float64
read_literacy       480531 non-null float64
sci_literacy        480531 non-null float64
civil_liberties     480531 non-null int64
disadvantaged       468689 non-null float64
advantaged          468689 non-null float64
dtypes: category(1), float64(7), int64(2), object(2)
memory usage: 44.5+ MB
In [24]:
# Save a copy of the cleaned dataframe
df_clean.to_csv('data/pisa2012_clean.csv')

Univariate Exploration

In this section, investigate distributions of individual variables. If you see unusual points or outliers, take a deeper look to clean things up and prepare yourself to look at relationships between variables.

Which countries did the students come from?

In [25]:
fig, ax = plt.subplots()
fig.set_size_inches(8.5, 11)
df_clean['country'].value_counts(ascending=True).plot(kind="barh", fontsize=11)
ax.set_ylabel("Country", fontsize=14)
ax.set_xlabel("Number of Students", fontsize=14)
plt.show()
fig.savefig("images/students-by-country.png")

Which countries have the highest percentage of disadvantaged students?

In [26]:
fig, ax = plt.subplots()
fig.set_size_inches(8.5, 11)
dt = (df_clean.query('disadvantaged == 1').groupby('country')['student_id'].count() /
      df_clean.groupby('country')['student_id'].count()).sort_values() * 100
dt.dropna().plot(kind="barh", fontsize=11)
ax.set_ylabel("Country", fontsize=14)
ax.set_xlabel("Percentage of Students Disadvantaged", fontsize=14)
plt.show()
fig.savefig("images/disadvantaged-students-by-country.png")

Which countries have the highest percentage of advantaged students?

In [27]:
fig, ax = plt.subplots()
fig.set_size_inches(8.5, 11)
dt = (df_clean.query('advantaged == 1').groupby('country')['student_id'].count() /
      df_clean.groupby('country')['student_id'].count()).sort_values() * 100
dt.dropna().plot(kind="barh", fontsize=11)
ax.set_ylabel("Country", fontsize=14)
ax.set_xlabel("Percentage of Students Advantaged", fontsize=14)
plt.show()
fig.savefig("images/advantaged-students-by-country.png")

Age Started Learning Language

In [28]:
df_clean['age_start_learn'].value_counts(normalize=True) * 100
Out[28]:
0 to 3 years         64.617673
4 to 6 years         22.820672
7 to 9 years          7.420994
10 to 12 years        3.160243
13 years or older     1.980419
Name: age_start_learn, dtype: float64
In [29]:
fig, ax = plt.subplots()
sns.countplot(data=df_clean, x='age_start_learn', palette='RdBu_r')
plt.show()
fig.savefig("images/age-started-learning.png")

Overall Math/Reading/Science Scores

In [30]:
# Function for plotting multiple histograms from here:
# https://stackoverflow.com/questions/29530355/plotting- \
# multiple-histograms-in-grid
def draw_histograms(df, variables, n_rows, n_cols):
    fig = plt.figure()
    for i, var_name in enumerate(variables):
        ax = fig.add_subplot(n_rows, n_cols, i + 1)
        df[var_name].hist(bins=100, ax=ax)
        ax.set_title(var_name + " Distribution")
    fig.set_size_inches(11, 8)
    fig.tight_layout()  # Improves appearance a bit.
    plt.show()


draw_histograms(df_clean, 
                ['math_literacy', 
                 'read_literacy', 
                 'sci_literacy',
                 'overall_literacy'], 
                2, 2)

As expected and described in the PISA literature, each of these scores follows a standard distribution and are centered at approximately 500.

Bivariate Exploration

First let's look at a correlation plot for each of the numeric variables in our dataframe.

In [31]:
numeric_vars = [
    'ESCS', 'overall_literacy', 'math_literacy', 'read_literacy',
    'sci_literacy', 'civil_liberties'
]
In [32]:
# correlation plot
plt.figure(figsize=[10, 10])
sns.heatmap(df_clean[numeric_vars].corr(),
           annot=True,
           fmt='.3f',
           cmap='RdBu',
           center=0)
plt.show()

We can see that there is quite a bit of correlation between the math, reading, and science literacy scores. (Of course each of these are also highly correlated with the overall literacy score, which is derived from them.) ESCS has medium correlation with the academic scores. Civil liberties appears to be weakly negatively correlated with the academic scores (in other words, as countries become less free - higher scores - literacy scores decline). We don't yet know whether that is statistically significant, we'll determine that later.

In [33]:
# plot matrix: sample 1000 rows so that plots are clearer and
# they render faster
samples = np.random.choice(df_clean.shape[0], 1000, replace=False)
df_samp = df_clean.loc[samples, :]

g = sns.PairGrid(data=df_samp.dropna(), vars=numeric_vars)
g = g.map_diag(plt.hist, bins=20)
g.map_offdiag(plt.scatter)
plt.show()

Relationship between Overall Literacy and Civil Liberties

In [34]:
fig, ax = plt.subplots()
sns.violinplot(data=df_clean,
            x='civil_liberties',
            y='overall_literacy',
            palette='RdBu_r')
ax.set_ylabel("Overall Literacy", fontsize=14)
ax.set_xlabel("Civil Liberties score (1=most, 6=least free)", fontsize=14)
plt.show()
fig.savefig("images/overall-literacy-civil-liberties.png")

When we look at overall literacy by civil liberties category (1 = most free, 7 = least free) we can see that the most and least free countries seem to perform the best, with a slight decline from most to least free in the middle range (2-5). This is intriguing, and something we'll examine in more detail later.

Relationship between Overall Literacy and Economic, Social, Cultural Opportunity

In [35]:
fig, ax = plt.subplots()
fig.set_size_inches(11, 8)
sns.scatterplot(data=df_clean,
                x='overall_literacy',
                y='ESCS',
                s=25,
                palette='RdBu_r')
plt.title("ESCS vs. Overall Literacy")
plt.ylabel("ESCS (+/- stdev)")
plt.xlabel("Overall Literacy Score")
plt.show()

When we plot overall literacy scores against ESCS we can see some correlation. We can measure that:

In [36]:
df_clean['ESCS'].corr(df_clean['overall_literacy'])
Out[36]:
0.42956613577759306

ESCS has a medium positive correlation with overall literacy.

In [37]:
df_clean['ESCS'].corr(df_clean['civil_liberties'])
Out[37]:
-0.1783027020940104

ESCS has a weak negative correlation to civil liberties (meaning less free countries tend to have lower ESCS).

Relationship between Reading Literacy and Age Started Learning

In [38]:
fig, ax = plt.subplots()
sns.violinplot(data=df_clean,
               x='age_start_learn',
               y='read_literacy',
               palette='RdBu_r')
ax.set_ylabel("Reading Literacy", fontsize=14)
ax.set_xlabel("Age Started Learning", fontsize=14)
ax.set_title("Reading Literacy by Age Started Learning")
plt.show()
fig.savefig("images/read-literacy-age-started-learning.png")

From a visual inspection, it appears that reading literacy declines the longer a student goes before starting to learn. We might expect that since students were age 15 when they took the exam. Students in the 13 years and older category would have been studying the test language for no more than 2 years.

Below I'll perform some regression analysis to determine if this decline is statistically significant.

Relationship between Disadvantage/Advantage and Overall Literacy

In [39]:
df1 = pd.DataFrame(df_clean.query('disadvantaged == 1'),
                   columns=['overall_literacy']).assign(opp='Disadvantaged')
df2 = pd.DataFrame(df_clean.query('advantaged == 0 and disadvantaged == 0'),
                   columns=['overall_literacy'
                           ]).assign(opp='Neither Advantaged\nnor Disadvantaged')
df3 = pd.DataFrame(df_clean.query('advantaged == 1'),
                   columns=['overall_literacy']).assign(opp='Advantaged')

cdf = pd.concat([df1, df2, df3])

mdf = pd.melt(cdf, id_vars=['opp'], var_name=['overall_literacy'])

fig, ax = plt.subplots()
fig.set_size_inches(11, 8)
ax = sns.violinplot(x='opp', y='value', data=mdf, palette='RdBu')
ax.set_xlabel(None)
ax.set_ylabel("Overall Literacy")
ax.set_title("Overall Literacy by Opportunity")
plt.show()
fig.savefig("images/overall-literacy-by-opportunity.png")

We can see the stark difference in overall academic literacy based on opportunity status. Disadvantaged students seem to have their scores weighted down, pulled below the median of 500. Advantaged students, on the other hand, appear to have their scores pulled up by a force from above. We'll look at whether this difference is statistically significant.

Multivariate Exploration

Relationship between Gender, Age Started Learning Language, and Reading Literacy

In [40]:
fig, ax = plt.subplots()
sns.boxplot(data=df_clean,
            x='age_start_learn',
            y='read_literacy',
            palette=['mediumorchid', 'mediumaquamarine'],
            hue='gender')
ax.set_ylabel("Reading Literacy", fontsize=14)
ax.set_xlabel("Age Started Learning Language", fontsize=14)
plt.show()
fig.savefig("images/read-literacy-gender-age-started-learning.png")

Here we observe again that reading performance appears to decline the later a student learned the test language. Keep in mind that the test subjects were 15-year-olds, meaning that students who only started learning the test language at 13 years or older had fewer than two years of familiarity with the language at test time.

Male students appear to perform worse overall than female students, and the performance gap between female and male students seems to increase the later students first began learning the language.

Relationship between ESCS, Age Started Learning, and Reading Literacy

In [41]:
fig, ax = plt.subplots()
fig.set_size_inches(12, 10)
sns.scatterplot(data=df_clean.dropna(),
                x='read_literacy',
                y='ESCS',
                palette="RdBu_r",
                hue='age_start_learn',
                s=50)
ax.set_xlabel("Reading Literacy", fontsize=14)
ax.set_ylabel("ESCS (+/- stdev)", fontsize=14)
plt.show()

We can revist our ESCS vs. literacy plot, but this time looking only at reading literacy overlaid with the age student began learning the language. We can see several points for 13 years or older in the lower-left of our plot, both performing poorly on the literacy and below the mean ESCS (disadvantaged).

Relationship between Country, Overall Literacy, and Civil Liberties

In [42]:
cnt_sort = df_clean.groupby('country')['overall_literacy'].mean().sort_values(
    ascending=False)

fig, ax = plt.subplots()
fig.set_size_inches(8.5, 20)
sns.boxplot(data=df_clean,
            y='country',
            x='overall_literacy',
            order=cnt_sort.index.get_level_values('country'),
            dodge=False,
            palette='RdBu_r',
            hue='civil_liberties',
            width=0.8,
            linewidth=1.5)
ax.set_ylabel("Country", fontsize=12)
ax.set_xlabel("Avg. Overall Literacy Score", fontsize=12)
legend = ax.legend(loc='best', title_fontsize=10).set_title(
    'Civil Liberties\n(1=most free,\n6=least free)')
ax.tick_params(labelsize=12)
plt.show()
fig.savefig("images/overall-literacy-country.png")

Relationship between Country, ESCS, and Civil Liberties

In [43]:
cnt_sort = df_clean.groupby('country')['ESCS'].mean().sort_values(
    ascending=False).dropna()

fig, ax = plt.subplots()
fig.set_size_inches(8.5, 20)
sns.boxplot(data=df_clean,
            y='country',
            x='ESCS',
            order=cnt_sort.index.get_level_values('country'),
            dodge=False,
            palette='RdBu_r',
            hue='civil_liberties',
            width=0.8,
            linewidth=1.5)
ax.set_ylabel("Country", fontsize=12)
ax.set_xlabel("ESCS (0=average, +/- stdev)", fontsize=12)
legend = ax.legend(loc='lower right', title_fontsize=10).set_title(
    'Civil Liberties\n(1=most free,\n6=least free)')
ax.tick_params(labelsize=13)
plt.show()
fig.savefig("images/escs-country.png")

ESCS seem to be missing for Albania. That made me suspicious that there was an error on my part, especially because Albania comes first alphabetically and their data were at the top of the imported CSV. However, when I investigated further I found that there was good reason for Albania data to be missing and this was intentional:

"For example, the reliability of parental occupation data from Albania was subject to scrutiny, resulting in a recommendation that all data dependant on Albania’s parental occupation data (in particular, all data that use the PISA index of economic, social and cultural status [ESCS]) should be deleted from the database and relevant tables." (Source: PISA 2012 Technical Report, p. 280)

Regression Analysis

In [44]:
df_new = df_clean.copy()

# Add an intercept column
df_new['intercept'] = 1
In [45]:
# Create one-hot encoded columns for Age Started Learning
df_new = df_new.join(pd.get_dummies(df_new['age_start_learn']))

# Create one-hot encoded columns for Gender
df_new = df_new.join(pd.get_dummies(df_new['gender']))
In [46]:
df_new.sample(5)
Out[46]:
student_id country ESCS age_start_learn gender overall_literacy math_literacy read_literacy sci_literacy civil_liberties disadvantaged advantaged intercept 0 to 3 years 4 to 6 years 7 to 9 years 10 to 12 years 13 years or older Female Male
451940 3610 Chinese Taipei 0.72 NaN Female 599.764133 643.17652 602.64702 553.46886 2 0.0 0.0 1 0 0 0 0 0 1 0
464034 3052 Tunisia -2.03 NaN Male 357.264553 395.16280 343.69540 332.93546 4 1.0 0.0 1 0 0 0 0 0 0 1
245539 4950 Italy 1.63 NaN Female 396.003427 384.80292 423.92724 379.28012 1 0.0 1.0 1 0 0 0 0 0 1 0
146976 7031 Spain 0.94 NaN Female 549.037973 513.95078 596.29256 536.87058 1 0.0 0.0 1 0 0 0 0 0 1 0
246829 6240 Italy -0.59 NaN Male 542.199907 605.63170 502.56072 518.40730 1 0.0 0.0 1 0 0 0 0 0 0 1

Does Age Started Learning affect performance on reading?

Here our null hypothesis is that the mean reading performance for any of the age groups a student began learning ($\mu_{group}$) are equal to the mean of students who began learning between 0 and 3 years ($\mu_{0-3 years}$). We express that as:

$$\large N_{0}: \mu_{group} = \mu_{0-3 years}$$

Our alternative hypothesis is that for individual age groups the mean reading performance is not equal to the overall mean reading performance:

$$\large N_{1}: \mu_{group} \neq \mu_{0-3 years}$$

Our $\alpha$ (alpha) is 0.05.

In [47]:
# Regression analysis of read_literacy scores
# holding back 0 to 3 years age started learning as baseline
lm = sm.OLS(
    df_new['read_literacy'],
    df_new[[
        'intercept', '4 to 6 years', '7 to 9 years',
        '10 to 12 years', '13 years or older'
    ]],
)
results = lm.fit()
results.summary()
Out[47]:
OLS Regression Results
Dep. Variable: read_literacy R-squared: 0.001
Model: OLS Adj. R-squared: 0.001
Method: Least Squares F-statistic: 156.4
Date: Wed, 29 May 2019 Prob (F-statistic): 4.99e-134
Time: 22:39:28 Log-Likelihood: -2.8899e+06
No. Observations: 480531 AIC: 5.780e+06
Df Residuals: 480526 BIC: 5.780e+06
Df Model: 4
Covariance Type: nonrobust
coef std err t P>|t| [0.025 0.975]
intercept 471.9788 0.145 3255.401 0.000 471.695 472.263
4 to 6 years -1.6168 1.042 -1.552 0.121 -3.659 0.425
7 to 9 years -19.2518 1.815 -10.607 0.000 -22.809 -15.694
10 to 12 years -30.0849 2.776 -10.836 0.000 -35.526 -24.643
13 years or older -69.8937 3.505 -19.939 0.000 -76.764 -63.023
Omnibus: 3695.549 Durbin-Watson: 0.955
Prob(Omnibus): 0.000 Jarque-Bera (JB): 3685.113
Skew: -0.201 Prob(JB): 0.00
Kurtosis: 2.852 Cond. No. 24.6


Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

From the above we can reject the null hypothesis: all of the age groups perform worse in reading when compared to 0 to 3 years. This performance penalty increases as the age a student began learning increases. Students who begin learning later than the ages of 0 to 3 years perform worse on the reading literacy tests. For example, a student who did not begin learning the test language until age 13 or older would be expected to perform -70 points lower than the baseline reading literacy score of a student who began learning the language at ages 0 to 3, all other variables being constant.

Our p-value for all of the tested age groups is p=0.00 which is below our $\alpha$ of 0.05.

The less time a child had to learn test language, the worse they performed on reading literacy.

Does Advantaged/Disadvantaged Status affect academic performance

Here our null hypothesis is that the mean overall academic performance is not affected by whether the student is advantaged or disadvantaged. Mean performance for those groups is equal to the overall mean performance.

$$\large N_{0}: \mu_{adv/dis} = \mu$$

Our alternative hypothesis is that for students that are advantaged and/or disadvantaged, their mean performance is not equal to mean performance of the overall group:

$$\large N_{1}: \mu_{adv/dis} \neq \mu$$

Our $\alpha$ (alpha) is 0.05.

In [48]:
dt = df_new.dropna()

lm = sm.OLS(
    dt['overall_literacy'],
    dt[[
        'intercept', 'disadvantaged', 'advantaged'
    ]],
)
results = lm.fit()
results.summary()
Out[48]:
OLS Regression Results
Dep. Variable: overall_literacy R-squared: 0.065
Model: OLS Adj. R-squared: 0.065
Method: Least Squares F-statistic: 1389.
Date: Wed, 29 May 2019 Prob (F-statistic): 0.00
Time: 22:39:28 Log-Likelihood: -2.3523e+05
No. Observations: 39905 AIC: 4.705e+05
Df Residuals: 39902 BIC: 4.705e+05
Df Model: 2
Covariance Type: nonrobust
coef std err t P>|t| [0.025 0.975]
intercept 478.7766 0.538 890.277 0.000 477.722 479.831
disadvantaged -38.4234 1.168 -32.883 0.000 -40.714 -36.133
advantaged 42.4751 1.253 33.893 0.000 40.019 44.931
Omnibus: 103.871 Durbin-Watson: 1.238
Prob(Omnibus): 0.000 Jarque-Bera (JB): 103.401
Skew: 0.117 Prob(JB): 3.52e-23
Kurtosis: 2.913 Cond. No. 3.21


Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

From our analysis we can reject the null hypothesis and find:

$$\large N_{1} : \mu_{disadvantaged} < \mu$$$$\large N_{1} : \mu_{advantaged} > \mu$$

Being advantaged accounts for +42 points and being disadvantaged counts for -38 points, all else being equal. Our p-values are 0 which is below our $\alpha$ of 0.05.

Advantaged children outperform disadvantaged children.

Does students in freer countries perform differently from students in less free countries?

Here our null hypothesis is that the mean overall academic performance is not affected by the civil liberties rating of their country. Mean performance for those students is equal to the overall mean performance.

$$\large N_{0}: \mu_{civ lib} = \mu$$

Our alternative hypothesis is that for students in freer countries, their mean performance is not equal to mean performance of the overall group:

$$\large N_{1}: \mu_{civ lib} \neq \mu$$

Our $\alpha$ (alpha) is 0.05.

In [49]:
lm = sm.OLS(
    df_new['overall_literacy'],
    df_new[[
        'intercept', 'civil_liberties'
    ]],
)
results = lm.fit()
results.summary()
Out[49]:
OLS Regression Results
Dep. Variable: overall_literacy R-squared: 0.046
Model: OLS Adj. R-squared: 0.046
Method: Least Squares F-statistic: 2.331e+04
Date: Wed, 29 May 2019 Prob (F-statistic): 0.00
Time: 22:39:28 Log-Likelihood: -2.8646e+06
No. Observations: 480531 AIC: 5.729e+06
Df Residuals: 480529 BIC: 5.729e+06
Df Model: 1
Covariance Type: nonrobust
coef std err t P>|t| [0.025 0.975]
intercept 501.2866 0.235 2136.622 0.000 500.827 501.746
civil_liberties -13.8785 0.091 -152.674 0.000 -14.057 -13.700
Omnibus: 1108.730 Durbin-Watson: 0.947
Prob(Omnibus): 0.000 Jarque-Bera (JB): 962.267
Skew: 0.060 Prob(JB): 1.11e-209
Kurtosis: 2.816 Cond. No. 4.94


Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

Here we determine that the effect of the civil liberties score on overall academic performance is negatively correlated. For each +1 score on the civil liberties rating we expect the academic literacy score to decline by -14. Remember, our civil liberties scale is from 1 (most free) to 7 (least free). That means that children in less free countries perform worse than children in freer societies. The effect is very small, but our p-value = 0 which is below our $\alpha$ of 0.05 so it is statistically significant.

China – a country rated 6 (least free) on our scale consistently performs at the top in our academic scores, so I was curious how much the effect would change if we excluded China. We might expect the gap to widen.

In [50]:
# Exclude the Mainland China and Macao, but leave Taiwan and Hong Kong
dt = df_new.query('country != "China (Shanghai)" and country != "Macao-China"')
In [51]:
lm = sm.OLS(
    dt['overall_literacy'],
    dt[[
        'intercept', 'civil_liberties'
    ]],
)
results = lm.fit()
results.summary()
Out[51]:
OLS Regression Results
Dep. Variable: overall_literacy R-squared: 0.086
Model: OLS Adj. R-squared: 0.086
Method: Least Squares F-statistic: 4.409e+04
Date: Wed, 29 May 2019 Prob (F-statistic): 0.00
Time: 22:39:28 Log-Likelihood: -2.7891e+06
No. Observations: 470019 AIC: 5.578e+06
Df Residuals: 470017 BIC: 5.578e+06
Df Model: 1
Covariance Type: nonrobust
coef std err t P>|t| [0.025 0.975]
intercept 510.9616 0.235 2170.142 0.000 510.500 511.423
civil_liberties -20.1750 0.096 -209.978 0.000 -20.363 -19.987
Omnibus: 792.984 Durbin-Watson: 1.002
Prob(Omnibus): 0.000 Jarque-Bera (JB): 657.114
Skew: 0.015 Prob(JB): 2.04e-143
Kurtosis: 2.819 Cond. No. 4.84


Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

Indeed the effect increases when we exclude China. Each +1 on the civil liberties scale reduces the overall literacy score by -20.

Students in freer countries perform better academically than children in less free countries.

In [ ]: